Building Large Arabic Multi-domain Resources for Sentiment Analysis
نویسندگان
چکیده
While there has been a recent progress in the area of Arabic Sentiment Analysis, most of the resources in this area are either of limited size, domain specific or not publicly available. In this paper, we address this problem by generating large multi-domain datasets for Sentiment Analysis in Arabic. The datasets were scrapped from different reviewing websites and consist of a total of 33K annotated reviews for movies, hotels, restaurants and products. Moreover we build multi-domain lexicons from the generated datasets. Different experiments have been carried out to validate the usefulness of the datasets and the generated lexicons for the task of sentiment classification. From the experimental results, we highlight some useful insights addressing: the best performing classifiers and feature representation methods, the effect of introducing lexicon based features and factors affecting the accuracy of sentiment classification in general. All the datasets, experiments code and results have been made publicly available for scientific purposes.
منابع مشابه
A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining
Most opinion mining methods in English rely successfully on sentiment lexicons, such as English SentiWordnet (ESWN). While there have been efforts towards building Arabic sentiment lexicons, they suffer from many deficiencies: limited size, unclear usability plan given Arabic’s rich morphology, or nonavailability publicly. In this paper, we address all of these issues and produce the first publ...
متن کاملGulf Arabic Linguistic Resource Building for Sentiment Analysis
This paper deals with building linguistic resources for Gulf Arabic, one of the Arabic variations, for sentiment analysis task using machine learning. To our knowledge, no previous works were done for Gulf Arabic sentiment analysis despite the fact that it is present in different online platforms. Hence, the first challenge is the absence of annotated data and sentiment lexicons. To fill this g...
متن کاملSANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis
The computational treatment of subjectivity and sentiment in natural language is usually significantly improved by applying features exploiting lexical resources where entries are tagged with semantic orientation (e.g., positive, negative values). In spite of the fair amount of work on Arabic sentiment analysis over the past few years, e.g., (Abbasi et al., 2008; Abdul-Mageed et al., 2014; Abdu...
متن کاملHow Translation Alters Sentiment
Sentiment analysis research has predominantly been on English texts. Thus there exist many sentiment resources for English, but less so for other languages. Approaches to improve sentiment analysis in a resource-poor focus language include: (a) translate the focus language text into a resource-rich language such as English, and apply a powerful English sentiment analysis system on the text, and...
متن کاملSentiment Classification at Discourse Segment Level: Experiments on multi-domain Arabic corpus
Sentiment classification aims to determine whether the semantic orientation of a text is positive, negative or neutral. It can be tackled at several levels of granularity: expression or phrase level, sentence level, and document level. In the scope of this research, we are interested in the sentence and sub-sentential level classification which can provide very useful trends for information ret...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015